Correia Consultant Agency
Introduction
Starbucks’ gargantuan $26 billion of revenue in 2019 is resounding evidence that Americans, above everything, love their coffee. They prove it time and time again, most recently during the current pandemic: some, unable to relinquish their daily ‘bucks, waited hours on end in long drive-thru lines to get their fix when coffee shops were closed for in-person operations.
It is no surprise, then, that Starbucks owns more than 8,000 stores across the US, and continues to grow everyday. These sheer numbers are reflective of a company that is about more than just coffee, having a significant influence on US society and culture with (often controversial) initiatives such as removing religious references from holiday-themed cups. In addition, Starbucks is known for treating their employees extremely well by providing health coverage, tuition coverage, and 401(k) plans, which again reinforces their highly-regarded brand.
As such, the astounding popularity of the chain in the country and the prospect of further expansion raises important questions. What can the current distribution of Starbucks stores tell us about societal factors across the country? Which factors should the brand consider when expanding into new locations? How can companies like Starbucks, which pride themselves in a positive social and environmental outlook, incorporate such values into their corporate strategy - especially in the current pandemic?
Overview of Analysis
In order to answer these questions, we sought to understand the relationship between state-level social and economic factors, such as income and unemployment; and the distribution of Starbucks locations in the US. Furthermore, we wanted to understand what company behavior and decision-making could look like with a more social-minded approach, instead of considering sheer profits. Starbucks shops, after all, can be significant contributors to social good on a localized level by providing stable jobs that open up avenues for growth and education.
As such, in this project, we conduct a consulting case-study of Starbucks, analysing what insights are gained from the current distribution of stores in the US and providing socially-minded recommendations for new locations in which the brand can expand. We start this endeavor with a spatial analysis of Starbucks stores across the US, exploring the relationship between store density and income/unemployment level in each state. We then complement this analysis with a text-based sentiment analysis of recent tweets mentioning Starbucks, and explore how these sentiments vary across location. Finally, we use these variables to conduct a clustering analysis of states in the US, in order to determine the different “types” of states under which Starbucks operates. We proceed to use these clusters to identify the ideal locations for the brand to build new stores.
Data
In order to conduct this analysis, we collected the following data:
- The Starbucks Locations Worldwide dataset available on Kaggle and provided by Starbucks Corporation, including a record for every Starbucks or subsidiary store location that was operation during February 2017.
- The US Household Income Statistics dataset, available on Kaggle and provided by the Golden Oak Research group, containing information on US cities’ average income.
- The US Unemployment Rate by County dataset avaiable on Kaggle and provided by the US Department of Labor’s Bureau of Labor Statistics, containing unemployment and population data at the US county level from 1990-2016.
- The most recent 18,000 tweets containing the string “Starbucks” or “starbucks” in their text, scraped using the RTweet package to interface with the Twitter API.
Check the links in the bullets above for access to each of the data sources!
Distribution of Starbucks Across the US
The first step of our analysis was to understand how Starbucks stores are distributed across the US:
Looking at the plot above, displaying the distribution and density of Starbuck locations across the US, we can see that Starbucks locations are extremely clustered along the coast of the US, and less prominent within states in the Rocky Mountain and Midwestern regions of the country. This is evidence that there is a correlation between income and Starbuck store location, as we see a concentration of stores in the richer parts of the country, such as several states in the Northeast and California. This is corroborated by the heavy concentration of Starbucks in the major cities within each state, as opposed to being spread out across the state’s counties. Looking at the map, one can easily identify major cities such as New York, Los Angeles, Houston, Miami and Boston as Starbucks hotspots.
With strong evidence of high spatial concentration of starbucks stores and a correlation between income and number of stores, we wanted to understand if that gave rise to any patterns regarding sentiment towards Starbucks. Specifically, we theorized that areas with more Starbucks stores or higher income should have a more positive sentiment towards the brand. In the following section, we test this hypothesis using text-based sentiment analysis of recent tweets that mention Starbucks.
Cosumer Sentiments Towards Starbucks Across US
After using the RTweets package to scrape text and locations for the latests 18,000 tweets mentioning Starbucks, we calculated the proportion of negative/positive words (as defined by NRC Lexicon) in each state.
As mentioned previously, our initial hypothesis was that states with larger densities of starbucks would tweet more positive and less negative opinions. However, when observing the plots below, states with more Starbucks, such as California, Texas, Florida, and New York, all have a lower proportion of positive words in tweets and high proportion of negative words in tweets compared to other states in the US. Additionally, the states with a lower amount of Starbucks, such as Wisconsin, Iowa, Kansas, and Mississippi, had a higher proportion of positive words and lower proportion of negative words in tweets. This leads to the conclusion that it may be more beneficial for Starbucks to build its stores in states it is not currently heavily established in, according to consumer sentiment analysis.
Clustering by State
Motivated by the potential that Starbucks might want to expand into areas it has not heavily invested in, we sought to understand how we could segment states into categories in order to determine how Starbucks might strategize differently in each of them.
State-Level Factors
We thus employed a k-means clustering algorithm to US States, using the following variables in order to categorize them:
- Average Proportion of Negative Sentiment in Tweets
- Average Income
- Average Unemployment Rate
- Number of Starbucks
State-Level Clusters
Using the elbow plot below, we found that 5 clusters would get us the most efficient information gain while keeping results interpretable.
Visualization of Cluster
The results of the clustering algorithm are displayed in the visualization below:
Cluster Characteristics
The matrix below shows the distribution of each cluster among the four categories. Cluster 1, which is composed solely of California, is characterized by having the highest number of Starbucks stores by a large margin (a total of 2821!). It’s also interesting to note that California has high average income and a high unemployment rate, which is unique since there is an inverse correlation between the two in other states. Cluster 2 represents the states with lower income than most but also relatively low unemployment, being a special case of the less-wealthy states. Cluster 3, composed solely of Vermont, is characterized by having the highest negative consumer sentiment. Cluster 4 represents the most wealthy states in the country, having high average income and a median unemployment rate. Finally, Cluster 5 represents the most impoverished states in the US, having a high unemployment rate, low average income and low number of Starbucks stores.
Final List of States
Having segmented states by their characteristics, we decided to pick one from each cluster in order to determine what a socially-minded strategy for Starbucks within that group of states might look like. To do so, we decided to pick the centermost state within each cluster, as the state “most representative” of that cluster. For California and Vermont, this was trivial, since they were the only states within their respective clusters. For the other three clusters, we calculated the distance from each state to its respective center and picked the states with the minimum distance. From this analysis, we concluded the final 5 states, each representative of their own cluster, are:
- Virginia
- Vermont
- Tennessee
- Iowa
- California
Clustering by County
County-Level Factors
- Average Income
- Average Unemployment Rate
- Number of Starbucks
Step-by Step Process
- Cluster Analysis
- Creating clusters based on appropriate number of centers
- Determine the most suitable cluster of counties
Determining Amount of Clusters for Each State
Virginia
## TableGrob (12 x 11) "layout": 18 grobs
## z cells name grob
## 1 5 ( 6- 6, 4- 4) spacer zeroGrob[NULL]
## 2 7 ( 7- 7, 4- 4) axis-l absoluteGrob[GRID.absoluteGrob.1483]
## 3 3 ( 8- 8, 4- 4) spacer zeroGrob[NULL]
## 4 6 ( 6- 6, 5- 5) axis-t zeroGrob[NULL]
## 5 1 ( 7- 7, 5- 5) panel gTree[panel-1.gTree.1481]
## 6 9 ( 8- 8, 5- 5) axis-b absoluteGrob[GRID.absoluteGrob.1482]
## 7 4 ( 6- 6, 6- 6) spacer zeroGrob[NULL]
## 8 8 ( 7- 7, 6- 6) axis-r zeroGrob[NULL]
## 9 2 ( 8- 8, 6- 6) spacer zeroGrob[NULL]
## 10 10 ( 5- 5, 5- 5) xlab-t zeroGrob[NULL]
## 11 11 ( 9- 9, 5- 5) xlab-b zeroGrob[axis.title.x.bottom..zeroGrob.1484]
## 12 12 ( 7- 7, 3- 3) ylab-l zeroGrob[axis.title.y.left..zeroGrob.1485]
## 13 13 ( 7- 7, 7- 7) ylab-r zeroGrob[NULL]
## 14 14 ( 7- 7, 9- 9) guide-box gtable[guide-box]
## 15 15 ( 4- 4, 5- 5) subtitle zeroGrob[plot.subtitle..zeroGrob.1524]
## 16 16 ( 3- 3, 5- 5) title titleGrob[plot.title..titleGrob.1523]
## 17 17 (10-10, 5- 5) caption titleGrob[plot.caption..titleGrob.1528]
## 18 18 ( 2- 2, 2- 2) tag zeroGrob[plot.tag..zeroGrob.1525]
Cluster characteristics, explain why we chose cluster 4
Vermont
Cluster Analysis
Cluster characteristics, explain why we chose cluster 1
Tennessee
Cluster Analysis
Visualizing clusters
Cluster characteristics, explain why we chose cluster 4
Iowa
Cluster Analysis
Visualizing clusters
Cluster characteristics, explain why we chose cluster 5
California
Cluster Analysis
Visualizing clusters
Cluster characteristics, explain why we chose cluster 1
Final Recommendations
Summarize overall process and thinking and impact of starbucks. Show the data table of all important counties from clusters we chose:
| County | Avg. Income | Num. Starbucks | Avg. Unemployment Rate |
|---|---|---|---|
| Fresno | 55523.57 | 36 | 12.326804 |
| Glenn | 53148.00 | 0 | 11.998625 |
| Kern | 49580.61 | 39 | 11.487972 |
| Madera | 49355.76 | 5 | 11.890378 |
| Merced | 49976.11 | 7 | 13.486942 |
| Modoc | 43340.33 | 0 | 10.394502 |
| Monterey | 68913.00 | 1 | 9.826804 |
| Plumas | 64924.83 | 0 | 11.346392 |
| San Joaquin | 71281.69 | 24 | 10.629553 |
| Santa Cruz | 82669.83 | 7 | 11.217886 |
| Shasta | 53369.18 | 10 | 9.644330 |
| Stanislaus | 48900.86 | 7 | 11.919244 |
| Tehama | 48808.75 | 1 | 9.508935 |
| Tulare | 45336.69 | 17 | 13.573540 |
| Yuba | 55171.50 | 0 | 12.325086 |
| Adams | 65864.50 | 0 | 6.569547 |
| Appanoose | 49003.50 | 0 | 5.678395 |
| Benton | 57531.00 | 0 | 6.154527 |
| Buchanan | 63804.00 | 0 | 7.088080 |
| Butler | 61836.00 | 0 | 5.995722 |
| Calhoun | 43333.00 | 0 | 7.597668 |
| Carroll | 55611.00 | 0 | 5.938277 |
| Cherokee | 55162.00 | 0 | 6.286043 |
| Clarke | 43619.50 | 0 | 7.021777 |
| Clay | 46965.00 | 0 | 6.565042 |
| Clayton | 54832.50 | 0 | 5.518210 |
| Clinton | 59001.00 | 0 | 6.144743 |
| Crawford | 51775.50 | 0 | 6.761908 |
| Dallas | 78098.67 | 0 | 7.271241 |
| Delaware | 55472.17 | 0 | 5.291101 |
| Fayette | 55887.25 | 0 | 6.797021 |
| Franklin | 50078.00 | 0 | 5.943186 |
| Greene | 58107.83 | 0 | 6.827746 |
| Grundy | 64870.50 | 0 | 6.497376 |
| Hardin | 58296.00 | 0 | 6.925958 |
| Henry | 51271.00 | 0 | 6.525775 |
| Jackson | 47535.00 | 0 | 6.540742 |
| Jasper | 59793.11 | 1 | 6.459512 |
| Jefferson | 56939.00 | 0 | 6.596809 |
| Linn | 73450.25 | 9 | 7.200540 |
| Lyon | 54713.00 | 0 | 5.473609 |
| Marion | 47388.00 | 0 | 7.014973 |
| Marshall | 60222.00 | 0 | 6.233670 |
| Osceola | 49948.00 | 0 | 5.801299 |
| Pocahontas | 50181.50 | 0 | 6.703086 |
| Union | 43619.50 | 0 | 6.206117 |
| Warren | 60983.96 | 7 | 5.924982 |
| Washington | 49947.33 | 0 | 5.938779 |
| Webster | 54712.17 | 0 | 6.443786 |
| Chittenden | 83283.50 | 0 | 3.387654 |
| Brunswick | 35873.00 | 0 | 7.469872 |
| Cumberland | 46083.00 | 0 | 6.302316 |
| Grayson | 32076.00 | 0 | 7.063253 |
| Greensville | 33192.67 | 0 | 5.819667 |
| Halifax | 29411.00 | 0 | 8.662019 |
| Lee | 31678.50 | 0 | 7.173886 |
| Northampton | 39718.25 | 0 | 6.739979 |
| Prince Edward | 54858.00 | 0 | 6.603667 |
| Scott | 46810.00 | 0 | 5.713129 |
| Smyth | 39226.00 | 0 | 8.147333 |
| Carter | 41428.75 | 0 | 6.840901 |
| Chester | 24221.00 | 0 | 7.115124 |
| Fentress | 39775.00 | 0 | 9.083333 |
| Hardeman | 26503.00 | 0 | 7.497459 |
| Lawrence | 45854.50 | 0 | 7.462405 |
| Lewis | 41536.00 | 0 | 8.101558 |
| Macon | 40533.00 | 0 | 7.106605 |
| Monroe | 48462.00 | 0 | 6.983337 |
| Obion | 45863.00 | 0 | 7.635802 |
| Rhea | 38409.50 | 0 | 8.395988 |
| Unicoi | 47091.25 | 0 | 8.077778 |
| Van Buren | 41919.00 | 0 | 7.148339 |
| Weakley | 40819.00 | 0 | 6.972840 |
Limitations and Conclusion
Paragraph here
Citations
List of important packages/data sets